A Pilot Study Exploring Spreadsheet Risk in Scientific Research

نویسندگان

  • Ghada AlTarawneh
  • Simon Thorne
چکیده

This paper discusses the risks and potential impacts of spreadsheet errors in scientific research data in a Neuroscience research centre in the UK. Spreadsheets usage in neuroscience, or indeed any medical discipline, is a largely unreported area of spreadsheet research. This paper presents a case study exploring the possible risks and impacts of spreadsheet errors in the neuroscience research centre at the University of Newcastle. Data was collected using an online questionnaire with 17 participants and two detailed semi-structured interviews. The analysis highlights that errors in research data may lead to severe impacts such as misleading science and damaged personal and organisational reputations. In addition, many risks factors arise from using spreadsheets such as inadequate design and a lack of training. Spreadsheets are used widely in business and the impacts and risks in these fields have been studied and highlighted in detail. However, scientific research and spreadsheets have also a significant relationship that has not been clarified. The paper also draws out the similarities in spreadsheet practice between the scientific and business communities. Proceedings of the EuSpRIG 2016 Conference “Spreadsheet Risk Management” ISBN : 978-1-905404-53-7 Copyright © 2016, EuSpRIG European Spreadsheet Risks Interest Group (www.eusprig.org) & the Author(s) 1.0 Introduction This paper discusses the risks and potential impacts of spreadsheet errors in scientific research data in a Neuroscience research centre in the UK. Spreadsheets usage in neuroscience, or indeed any medical discipline, is a largely unreported area of spreadsheet research. Although little is published on this subject, it seems likely that the medical discipline will make extensive use of spreadsheets for a variety of clinical and non-clinical activities. This assumption is based on the observation that spreadsheet use is ubiquitous in almost all areas of business, government and education. To that end, this paper aims to answer the following questions 1. To what extent are spreadsheets used by the Neuroscience Research Centre at the University of Newcastle for data processing and decision making activities? 2. How are spreadsheets planned, developed and maintained by the research centre? 3. What are the specific risks and potential sources of error arising from Neuroscience spreadsheet use? 4. What is the likely impact of spreadsheet risks and error on neuroscience research data? Data was collected using an online questionnaire with 17 participants and followed up by two detailed semi-structured interviews. Given the small sample size, only limited generalisations to the wider neuroscience and medical communities can be made. However, this research will gather some specific, interesting data and will highlight important areas for further research. 1.1 Neuroscience Neuroscience was defined by the U.S. congress as “the study of the nervous system, how it affects behaviour, and how it is affected by disease. The goal of neuroscience is to define and understand the continuum from molecular to cell to behaviour” (Congress, 1984). Several biological and human cognitive developments have been discovered in neuroscience research. Furthermore, many medical and mental health issues have been treated by its findings. Neuroscience did not gain the appropriate attention as a new science until the last decade. In fact, studying the brain and the nervous system sets the foundation for many other studies. Psychology for example is a wide and rich discipline and many recent neuroscience research studies and experiments on the brain have explained many irrational attitudes and behaviours through studying precursor neural circuit activity (Diamond and Amso, 2008). Moreover, the impact of neuroscience on the wider medical discipline has been significant. Neuroscience research has made important contributions to the understanding of many medical conditions and has highlighted new avenues of research in multiple medical disciplines (Conn, 2008). Proceedings of the EuSpRIG 2016 Conference “Spreadsheet Risk Management” ISBN : 978-1-905404-53-7 Copyright © 2016, EuSpRIG European Spreadsheet Risks Interest Group (www.eusprig.org) & the Author(s) 1.2 Research Data confidentiality and Integrity As with other areas of research, confidentiality and data protection are of the upmost importance in neuroscience. There are several sets of standards and recommendations to that end such as the guidelines published by The National Human Research Protections Advisory Committee (NHRPAC) (NHRPAC, 2002). These guidelines highlight the importance of research data and therefore, the risks associated with it. However, there are no explicit guidelines for spreadsheets. Research data services at Wisconsin-Madison University (WISC, 2014) discuss spreadsheets risks, errors and research data. They also published a set of guidelines and recommendations when using spreadsheets in order to minimize these errors in research data. Although these guidelines offer some basic advice, they are far from detailed.  Research data in the organization is very important and losing it could lead to many results such as: o Losing valuable experimental data or simulation results. o Drawing incorrect conclusions from data leading to negative research impact. o Damage to the reputation of the organization conducting the research. o Potential for regulatory or legal action from governmental or other institution. o Requirement for corrective actions or repairs which would take years of research. o Violation of University or organization mission, policy, or principles. This lack of detailed advice is in contrast to the regulation and control of spreadsheet applications in other medical fields such as the pharmaceutical industry in the United States. The pharmaceutical industry has highly specific controls placed on the use of spreadsheet applications for data analysis, reporting and decision making under Title 21 of the Code of Federal Regulations Part 11 (21 CFR 11). Spreadsheets are explicitly mentioned in this legislation which demands companies provide evidence of: audits, validation, electronic signatures and documentation for any software artefact including spreadsheets. The legislation also dictates that electronic artefacts such as spreadsheets be stored in a secure server so that once the spreadsheet has been created and audited, it cannot be changed without authorisation. 1.3 Neuroscience data management strategies There are many approaches available to manage and distribute research data. Spreadsheets are one of the most popular data manipulation and analysis tools used among researchers. Lacroix and Critchlow (2003) discuss spreadsheets as one of the two popular data management strategies in research. According to this research, spreadsheets offer quick data browsing, simple mathematical operations and easy distribution to collaborators. However, they also highlighted many points as disadvantages, in particular the lack of data validation when entered which can increase the possibility of errors. In their research (Anderson, et al., 2007), Anderson and his colleagues interviewed 286 researchers from different research fields including neuroscience. The majority of the researchers interviewed admitted that they rely on general-purpose applications such as spreadsheets to manage their data. The main reason for this reliance is the simplicity of interface, the range and power of data manipulation Proceedings of the EuSpRIG 2016 Conference “Spreadsheet Risk Management” ISBN : 978-1-905404-53-7 Copyright © 2016, EuSpRIG European Spreadsheet Risks Interest Group (www.eusprig.org) & the Author(s) tools and its short learning curve. According to one of the interviewees “Yeah, the spreadsheet has been our main workhorse, unfortunately”. Anderson et al. (2007) also note that several of the interviewees have encountered problems when using spreadsheets: “Well, we have multiple spreadsheets that’s one of the problems. We sort of have a master spreadsheet ... We try to minimize it as much as we can, but I think that’s a major problem.” “However, that exceeds the capabilities of the spreadsheet. Spreadsheet really bogs down any time you get past say 20,000 individual cells with columns.” “Well, it’s very cumbersome, I can’t print anything, I’d have to paste it together. I end up just doing a freeze frame so that I can scroll this way.” Although Anderson et al (2007) do not mention the wealth of research on spreadsheet error, it is clear from the quotes on spreadsheet problems that users experience problems with concurrency, computational power and usability. Spreadsheets are widely used in research for tabulating, analysing and sharing data; “Recent research in multiple disciplines shows that the use of spreadsheets to store and structure numeric and text data is commonplace” (WISC, 2014). The main reason for this is probably the same as the reason that business rely on spreadsheets. It represents a fast way to test hypotheses, plot data, conduct pilot experiments and prototype ideas. Moreover, it is ‘easy’ to learn and affordable. There are alternatives to spreadsheets for medical statistical analysis such as SPSS, R and Matlab. Both R and Matlab resemble programming languages and require a detailed knowledge of the syntax and argument construction to be wielded effectively. SPSS requires less specialised knowledge and resembles a spreadsheet. However, SPSS costs significantly more per license than Microsoft Excel (SPSS is starts at $1170 USD per year as a subscription, Excel costs $331 for a full user owned copy). Between the usability, flexibility, perceived ‘easiness’ of spreadsheet software and license cost, it is no surprise that spreadsheet software is the de-facto choice for data analysis in the medical field. Indeed, it is these same reasons that spreadsheet software is used extensively in the business world. 1.4 Spreadsheets Risks Spreadsheet software is amongst the most utilised commercial software in organisations world-wide. Users cite ‘ease of use’ and a wide range of functionality that could replace many complex information systems. The business world makes extensive use of spreadsheet software for data processing, analysis, decision science and data storing needs (SERP, 2006). Spreadsheets are however, prone to multiple risks and since they are standalone files, they lack system-wide controls off the shelf. Almost any employee can create access, manipulate, and distribute spreadsheet data. Hence, almost any employee can make a small risky error while manually entering data or configuring formulas (Deloite, 2009). Proceedings of the EuSpRIG 2016 Conference “Spreadsheet Risk Management” ISBN : 978-1-905404-53-7 Copyright © 2016, EuSpRIG European Spreadsheet Risks Interest Group (www.eusprig.org) & the Author(s) Spreadsheets errors are prevalent and can cause crises in organisations. Spreadsheets error rates and cell error rates are high and there are many real stories worldwide that show that these errors can cause serious problems in the business world (EuSpRIG, 2016). Through many years of research, it has become clear that spreadsheet use can carry multiple serious risks (Panko, 2008). The risk of making a simple mistake appears to be high, field studies show error rates shows that up to 90% of spreadsheet models contain at least one error (Panko, 2008). Spreadsheets are created and used without proper documentation and organisations generally do not have strict criteria governing their use. Loss of data is a particularly dangerous risk having all data on spreadsheets and not having a centralized and data recovery environment could lead to a crisis in any financial or nonfinancial system. Data availability environments should be created also to ensure business continuity in the event of data loss. Unskilled users could be considered as a risk since business spreadsheets are designed by both IS and non-IS professionals. The important issue to consider here is that non-IS professionals are unlikely to be trained in information systems development methods, meaning that the process of creating a spreadsheet is far more ad-hoc and is unlikely to follow standards dictated by software engineering. Indeed research shows that almost all spreadsheet modellers have no formal training (SERP, 2006). Because of this, it is impossible to guarantee the adherence of standards to any one spreadsheet modeller or the validity of a particular spreadsheet model. Research shows that most errors do not arise from mistakes in programming the spreadsheet, rather they arise from the misapplication of programming logic (Panko, 2008). This makes the lack of user training in formal development methods even more critical since, once committed, logic errors are difficult to find and correct. Without the knowledge of how to test and debug the spreadsheet, the chance of a user noticing and correcting such a mistake is low. 1.5 The Pilot Study This paper considers the research environment in the neuroscience department at the University of Newcastle. Within this department, there is diverse research being undertaken ranging from the basic biology of neurons to the abnormal activity associated with epilepsy, from music perception to mood disorders, from visual object recognition to retinal prostheses for the blind, from animal decisionmaking to anaesthesia to neurological disease. The department has various tools at its disposal including:  Brain scanning (MRI)  Cellular imaging and electrophysiology  Computational modeling  Molecular genetics  Animal and human behavioral laboratories  Psychophysics The neuroscience department consists of 17 researchers comprised undergraduate, postgraduate and post-doctoral students. There are also a number of senior academics within the department. The Proceedings of the EuSpRIG 2016 Conference “Spreadsheet Risk Management” ISBN : 978-1-905404-53-7 Copyright © 2016, EuSpRIG European Spreadsheet Risks Interest Group (www.eusprig.org) & the Author(s) department produces world class research, publishing papers in leading neuroscience outlets such as the Journal of Neuroscience, Brain and Language and Physics life reviews. 1.6 Research Materials and Methods A number of different research materials are employed to gather relevant information. Firstly an indepth questionnaire across all members of the research centre was distributed. There are also two indepth 45 minute interviews conducted with senior members of the department to supplement this information.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Establishing A Minimum Generic Skill Set For Risk Management Teaching In A Spreadsheet Training Course

Past research shows that spreadsheet models are prone to such a high frequency of errors and data security implications that the risk management of spreadsheet development and spreadsheet use is of great importance to both industry and academia. The underlying rationale for this paper is that spreadsheet training courses should specifically address risk management in the development process bot...

متن کامل

Exploring Human Factors in Spreadsheet Development

In this paper we consider human factors and their impact on spreadsheet development in strategic decision-making. This paper brings forward research from many disciplines both directly related to spreadsheets and a broader spectrum from psychology to industrial processing. We investigate how human factors affect a simplified development cycle and what the potential consequences are.

متن کامل

A Conceptual Model for Measuring the Complexity of Spreadsheets

Spreadsheets are widely used in industry, even for critical business processes. This implies the need for proper risk assessment in spreadsheets to evaluate the reliability and validity of the spreadsheet’s outcome. As related research has shown, the risk of spreadsheet errors is strongly related to the spreadsheet’s complexity. Therefore, spreadsheet researchers proposed various metrics for qu...

متن کامل

Spreadsheet Risk Management in Organisations

The paper examines in the context of financial reporting, the controls that organisations have in place to manage spreadsheet risk and errors. There has been widespread research conducted in this area, both in Ireland and internationally. This paper describes a study involving 19 participants (2 case studies and 17 by survey) from Ireland. Three areas are examined; firstly, the extent of spread...

متن کامل

Factors Affecting Social Commerce and Exploring the Mediating Role of Perceived Risk (Case Study: Social Media Users in Isfahan)

Owing to the ever-increasing prevalence of social media use, social commerce has become an important part of e-commerce. This study endeavors to explore the impact of social media quality and social support on the social commerce (SC) intention directly and through the variable of perceived risk. The sample included 214 social media users in Isfahan collected through simple random sampling meth...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1703.09785  شماره 

صفحات  -

تاریخ انتشار 2017